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Abstract 

We prove an apparently novel concentration of measure result for Markov tree processes. 
The bound we derive reduces to the known bounds for Markov processes when the tree is a 
chain, thus strictly generalizing the known Markov process concentration results. We employ 
several techniques of potential independent interest, especially for obtaining similar results for 
more general directed acyclic graphical models. 

1 Introduction 

An emerging paradigm for proving concentration results for nonproduct measures is to quantify 
the dependence between the variables and state the bounds in terms of that dependence (see 
|3j for an overview) . A process (measure) particularly amenable to this approach is the Markov 
process. Using different techniques, Marton (coupling method |SJ, 1996), Samson (log-Sobolev 
inequality 8 , 2000) and Kontorovich and Ramanan (martingale differences [3], 2006) have ob- 
tained qualitatively similar concentration of measure results for Markov processes. One natural 
generalization of the Markov process is the hidden Markov process; we proved a concentration 
result for this class in 0. A different way to generalize the Markov process is via the Markov 
tree process, which we address in the present paper. 

If (S n ,d) is a metric space and (X)i<i<„, X; € S is a random process, a measure concen- 
tration result (for the purposes of this paper) is an inequality stating that for any 1-Lipschitz 
(with respect to d) function / : S n — » K, we have 

P{|/(X)-E/(X)| >t} < 2exp(-Kt 2 ), (1) 

where K may depend on n but not on J. 1 

The quantity fjij, defined below, has proved useful for obtaining concentration results. For 
l<i<j<n, y€ S l ~ 1 and w £ S, let 

^X^X^ 1 =y,X i = w) 
be the law of X" conditioned on X[ _1 = y and X; = w. Define 

»?y(l,,tuX) = \\C(X?\Xi- 1 =y,X i = w)-C(X?\X\- 1 =y,X i = w')\\ TV (2) 



1 See \E\ for a much more general notion of concentration. 
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and 

fjij = sup sup riij(y,w,w') 

where ||-|| TV is the total variation norm (see tl2.ll to clarify notation). 

Let r and A be upper-triangular n x n matrices, with Tu = An = 1 and 

Tij = \J fjij , Aij = fjij 

for 1 < i < j < n. 

For the case where 5 = [0, 1] and d is the Euclidean metric on R™, Samson [5] showed that 
if / : [0, 1]™ — > M is convex and Lipschitz with ||/|| Lip < 1, then 

P{|/(X)-E/(*)|>i} < 2expf-^ ¥ ) (3) 

where |]r|| 2 is the £2 operator norm of the matrix T; Marton [7j has a comparable result. 
For the case where iS is countable and d is the (normalized) Hamming metric on <S", 

1 - 

n i=i 

Kontorovich and Ramanan |3] showed that if / : S" — > R is Lipschitz with ||/|| Li < 1, then 

F{\f(X)-Ef(X)\>t} < 2exp^-^|^^ (4) 

where ||A|| is the operator norm of the matrix A, also given by 

II A|| = max (l + fj ii+ i +... + fj in ). (5) 

l<i<n 

This leads to a strengthening of the Markov measure concentration result in Marton [H] . 

The sharpest currently known Markov measure concentration results were obtained in [S] 
and in terms of the contraction coefficients (0i)i<i< n °f the Markov process: 

% < (>,'),. :■■■/), ,. (6) 

In this paper, we prove a bound on fjij in terms of the contraction coefficients of the Markov 
tree process (Theorem 12 This bound is cumbersome to state without preliminary definitions, 
but it reduces to © in the case where the Markov tree is a chain. 



2 Bounding 77^ for Markov tree processes 
2.1 Notational preliminaries 

Random variables are capitalized (X), specified state sequences are written in lowercase (a;), the 
shorthand X\ = X^ . . . Xj is used for all sequences, and the concatenation of the sequences x 
and y is denoted by xy, as in xlx^ +1 — x\. Another way to index collections of variables is by 
subset: if / = {«! , z 2 , ■ • ■ ,« m } then we write xi = x[I] = {xi x ,Xi 2 , . . . ,Xi m }] we will write xj and 
x[I] interchangeably, as dictated by convenience. To avoid cumbersome subscripts, we will also 
occasionally use the bracket notation for vector components. Thus, u S R 5 , then 

U XI = U x[I] = U[x[l]] = VL( Xil ,x i2 ,...,x im ) € R 
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for each x[I] E S 1 . A similar bracket notation will apply for matrices. 

We will use |-| to denote set cardinalities. Sums will range over the entire space of the 
summation variable; thus stands for >J f( x l)i an d / ] f( x [I]) is shorthand for 

The probability operator P{-} is defined with respect the measure space specified in context. 

We will write [n] for the set {1, ...,n}. Anytime ||-|| appears without a subscript, it will 
always denote the total variation norm ||-|| TV , which we define here, for any signed measure r 
on a countable set X, by 

IMU = I E 1^)1 ■ ( ? ) 

If G = (V, E) is a graph, we will frequently abuse notation and write u € G instead of u G V, 
blurring the distinction between a graph and its vertex set. This notation will carry over to 
set-theoretic operations (G — G\ n G2) and indexing of variables (e.g., Xq). 

Unless we will need to refer explicitly to a er-algebra, we will suppress it in the probability 
space notation, using less rigorous formulations, such as "Let /1 be a measure on S n " . Fur- 
thermore, to avoid the technical but inessential complications associated with infinite sets, we 
will take S to be finite in this paper, noting only that the bounds carry over unchanged to the 
countable case (as done in |3] and To extend the results to the continuous case, some mild 
measure-theoretic assumptions are needed (see [?])■ 

2.2 Definition of Markov tree process 
2.2.1 Graph-theoretic preliminaries 

Consider a directed acyclic graph G — (V,E), and define a partial order -<q on G by the 
transitive closure of the relation 

u <q v if (u, v) G E. 
We define the parents and children of v 6 V in the natural way: 

parents(f ) — {u e V : (u, v) e E} 

and 

children^) = {w E V : (v, w) £ E}. 

If G is connected and each v £ V has at most one parent, G is called a (directed) tree. 
In a tree, whenever u <q v there is a unique directed path from u to v . A tree T always has a 
unique minimal (w.r.t. -<y) element r G V, called its root. Thus, for every v £ V there is a 
unique directed path r -<t n ■ ■ ■ r d — v- define the depth of v, dep T (v) — d, to be the 
length (i.e., number of edges) of this path. Note that dep T (ro) = 0. We define the depth of the 
tree by dep(T) = sup„ eT dep T (t>). 

For d = 0, 1, . . . define the d th level of the tree T by 

lev T (d) = {v G V : dep T (u) = d}; 
note that the levels induce a disjoint partition on V: 

dop(T) 

V= [j lev T (d). 
d=i 
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Wc define the width of a tree as the greatest number of nodes in any level: 

wid(T)= sup \lev T (d)\. (8) 

l<d<dcp(T) 

We will consistently take \ V\ = n for finite V. An ordering J : V — > N of the nodes is said to 
be breadth-first if 

dep T (u) < dep T (t;) =>■ J(u) < J(v). (9) 

Since every directed tree T — (V, E) has some breadth-first ordering, 2 we shall henceforth blur 
the distinction between v G V and J(v), simply taking V = [n] (or V = N) and assuming that 
dep T (u) < dep T (w) =>■ u < v holds. This will allow us to write S v simply as S n for any set S. 

Note that we have two orders on V: the partial order -<t, induced by the tree topology, and 
the total order <, given by the breadth-first enumeration. Observe that i -<<r j implies i < j 
but not vice versa. 

If T = (V,E) is a tree and u G V, we define the subtree induced by u, T u — (V U ,E U ) by 
V u = {v e V : u ^ T v}, E u = {(v,w) tE :v,w£ V u }. 

2.2.2 Markov tree measure 

If S is a finite set, a Markov tree measure \i is defined on S n by a tree T = (V,E) and 
transition kernels po, {pij(- | j)eE' Continuing our convention in i j2.2.1l we have a breadth- 
first order < and the total order -<t on V, and take y = {1, . . . , n}. Together, the topology of 
T and the transition kernels determine the measure \x on S n : 

(i(x)=p (x 1 ) Yl Pij( X j\ X i)- ( 10 ) 

A measure on S n satisfying 1)1 0|l for some T and {pij} is said to be compatible with tree T; a 
measure is a Markov tree measure if it is compatible with some tree. 

Suppose S is a finite set and (Xj)<gN, Xj G S is a random process defined on (S N ,P). If for 
each n > there is a tree = ([n],E^ n ') and a Markov tree measure /i n compatible with T( n ) 
such that for all x E S n we have 

p{xr = x} = /i n (x) 

then we call X a Markov tree process. The trees {T^} are easily seen to be consistent in the 
sense that is an induced subgraph of T^ n+1 \ So corresponding to any Markov tree process 
is the unique infinite tree T = (N, E). The uniqueness of T is easy to see, since for v > 1, the 
parent of v is the smallest u G N such that 

P{X„ = x v | = x'^} = P{X V =x v \X u = x u ] ■ 

thus P determines the topology of T. 

It is straightforward to verify that a Markov tree process {X v } v£ t compatible with tree T 
has the following Markov property: if v and v' are children of u in T, then 

F{X n = x, X Tv , =x'\X u =y} = F{X T% , =x\X u = y} ¥{X Tv , =x'\X u = y}. 

In other words, the subtrees induced by the children are conditionally independent given the 
parent; this follows directly from the definition of the Markov tree measure in l|10ll . 

2 One can easily construct a breadth-first ordering on a given tree by ordering the nodes arbitrarily within each 
level and listing the levels in ascending order: levr(l), levT(2), .... 
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2.3 Statement of result 

Theorem 2.1. Let S be a finite set and let (Xi)i<,i< n , Xi S S be a Markov tree process, defined 
by a tree T = (V,E) and transition kernels pq, {p uv {- | •)}(« v )eE- Define the (u,v)- contraction 
coefficient 6 UV by 

9 UV = ma,x\\p uv (-\y)-p uv (-\y')\\ (11) 
v,v'es 

Suppose max(„^) £B 9 UV < 9 < 1 for some 9 and wid(T) < L . Then for the Markov tree process 
X we have 

fja < (i-(i-e) L f j - i)/Li (12) 

for 1 < i < j < n. 

To cast (|12f) in more usable form, we first note that for L G N and k G N, if k > L then 



> — - — (13) 
- 2L - 1 V ' 



(we omit the elementary number-theoretic proof). Using l]13p. we have 

Vij < &~\ for j>i + L (14) 

where 

= (l-(l-6) L ) 1 / ( - 2L - 1 \ 

The bounds in © and Q are for different metric spaces and therefore not readily comparable 
(the result in © has the additional convexity assumption; see 0] for a discussion) . For the case 
where (1141) holds, Samson's bound [S] yields 

l|r|| a < -^r, (15) 

1 — 9 2 

and the approximation 

oo 

hail < = 

holds trivially via (J5J. In the (degenerate) case where the Markov tree is a chain, we have L = 1 
and therefore 9 = 9; thus we recover the Markov chain concentration results in [HI [H] an d the 
approximations in I|15I16|I become precise inequalities. 

Remark 2.2. The bounds in H15|) and (|16|l are approximate because (|14|l does not hold for all 
j > i but only starting with j > i + L. The difference between (l — (1 — 9) L ) 1 ">/ L ^ _ j anc [ 
6' J_J for i < j < i + L is at most 1 — 9 L ~ 1 and affects only a fixed finite number (L — 1) of 
entries in each row of V and A. Since ||-|| 2 and IHI^ are continuous functionals, we are justified 
in claiming the approximate bound, which may be quantified if an application calls for it. The 
statements in (|15f) and (|16f) are only meant to convey an order of magnitude. 

2.4 Proof of main result 

The proof of Theorem 12.11 is combination of elementary graph theory and tensor algebra. We 
start with a graph-theoretic lemma: 
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Lemma 2.3. Let T — ([n], E) be a tree and fix 1 < i < j < n. Suppose (Xj)i<j< n is a Markov 
tree process whose law P on S n is compatible with T (in the sense of H&.ff.Hj) . Define the set 

T/ = r i n{i,j + i,...,n}, 

consisting of those nodes in the subtree Ti whose breadth- first numbering does not precede j. 
Then, for y £ S 1 ' 1 and w, w' £ S , we have 

i *\ J T / = fM 

rjij[y,w,w ) = < , * (17) 

I i]ij (y,w,w ) otherwise, 

where jo is the minimum ( with respect to <) element of Tf . 

Remark 2.4. This lemma tells us that when computing rjij it is sufficient to restrict our attention 
to the subtree induced by i. 

Proof. The case j S Tj implies jo = j and is trivial; thus we assume j ^ Tj. In this case, the 
subtrees Tj and Tj are disjoint. Putting Tj = Tj \ {«}, we have by the Markov property, 

¥{X Ti = x % , X Tj = x Tj | X{ = yw) = ¥{X Ti = x % | X t = w} ¥{X Tj = x Tj \ X{- 1 = y) . 

Then from J5J) and Q, and by marginalizing out the Xt., we have 

Vij(y,w,w') = ±Y / \P{X? = ^\X{=yw}-F{X? = x]\Xl=yw'}\ 



3 H | P { X T/ = ^T/ I X i = W } - V { X Ti = X Tl I X i = W '} 



If Tl =0 then obviously r/ij — 0; otherwise, rjij = r)ij , since jo is the "first" element of T- . □ 

Next we develop some basic results for tensor norms; recall that unless specified otherwise, 
the norm used in this paper is the total variation norm defined in (JJJ . If A is an M x N column- 
stochastic matrix: (Ay > for 1 < i < M, 1 < j < N and = 1 f° r au 1 — i — x ) an d 
u e l w is balanced in the sense that X)j=i u i = 0> we have, by the Markov contraction lemma 
(f2j, Lemma B.l), 

l|Au|| < ||A||||u||, (18) 

where 

|| A|| - max JIA.j-A^II, (19) 

and A*j = A[-,j] denotes the j th column of A. An immediate consequence of <|18[) is that ||-|| 
satisfies 

||AB|| < ||A||||B|| (20) 
for column-stochastic matrices A G R MxN and B e M. NxP . 

Remark 2.5. Note that if A is a column-stochastic matrix then ||A|| < 1, and if additionally u 
is balanced then Au is also balanced. 
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If u € K M and v g R N , define their tensor product w = v ® u by 

W (i,j) = U i V J> 

where the notation (y^ujuj) is used to distinguish the 2-tensor w from an M x N matrix. The 
tensor w is a vector in R M " indexed by pairs (i, j) G [M] x [N]; its norm is naturally defined to 
be 

ii w ii = ^ E l w ^)l- ( 21 ) 

(iJ)e[M]x[N] 

The following "tensorizing" lemma will play a key role in deriving our bound (we suppress 
the boldfaced vector notation for readability): 

Lemma 2.6. Consider two finite sets X, y, with probability measures p, p' on X and q, q' ony. 
Then 

\\p®q-p' ®q'\\ < \\p-p'\\ + \\q-q'\\-\\p-p'\\\\q-q'\\. (22) 
Remark 2.7. Note that p <£> q is a 2-tensor in R Xxy and a probability measure on X x y. 

Proof. Fix q, q' and define the function 

F(u,v) = Y K ~ v *\ + h ~ tf'H ( 2 ~ E \ Ux ~ Vx \ J ~~ E I^^k 
xe* \ xsa 1 / xex, v ey 

over the convex polytope U C x M. x , 

u = : w x ,uc > o, y^Mx = y^x = l| ; 

note that proving the claim is equivalent to showing that F > on U . 
For any er g {-1,+1}* let 

J7 CT = {(u, u) g : sgn(u x - v x ) = a x }; 

note that U a is a convex polytope and that U = U«re{-i +1}-* ^°" 3 
Pick an arbitrary r g {— 1, and define 

F„{u, u) = E ^ w z) + Ik - g'|| ( 2 ~ E ^("^ - u x) ) - E T xy( u x<ly - v x q' y ) 

x \ x J x,y 

over t/ CT . Since a x (u x — v x ) = \u x — v x \ and T xy can be chosen (for any given u, v, q, q') so that 
T xy {u x q y — v x q' y ) = \u x q y — v x q' y \, the claim that F > on U will follow if we can show that 
F CT > on U a . 

Observe that F CT is affinc in its arguments (u, v) and recall that an affine function achieves 
its extreme values on the extreme points of a convex domain. Thus to verify that F& > on 
Ucr, we need only check the value of F CT on the extreme points of XJ a . The extreme points of 
are pairs (u, v) such that, for some x',x" g X, u = S(x') and v — S(x"), where S(x n ) g R x is 
given by [<5(a;o)]x = l{x=x }- Let (w, v) be an extreme point of U a . The case u — v is trivial, so 
assume u ^ v. In this case, J2xgx a x( u x — v x ) = 2 and 

E T xy(u x q v - v x q'y)\ < E l^x^-Wx^l 
xex, v ey xex, y ey 

< 2. 

This shows that F„ > on and completes the proof. □ 



3 We define sgn(z) = l{z>o} _ l{z<o}- Note that the constraint J2xex Ux = J2 x ex Vx ~ 1 f° rces ^<r = {(u,v) G 
17 : = when a x = +1 for &\\ x G X and [/<, = when er^ = —1 for all a; £ A". Both of these cases are trivial. 
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To develop a convenient tensor notation, we will fix the index set V = {1, . . . , n}. For I <zV, 
a tensor indexed by / is a vector u G R . A special case of such an /-tensor is the product 
u = ® lgJ vW, where e M 5 and 



u[x/] 



iei 



Wf 



for each a;/ G To gain more familiarity with the notation, let us write the total variation 
norm of an /-tensor: 

||u|| = \ J2 l u MI- (23) 

In order to extend Lemma l2~fil to product tensors, we will need to define the function a k '■ K fc — ► K 
and state some of its properties: 

Lemma 2.8. Define a k '■ R fc — ► M. recursively as a%(x) — x and 

a k+ i(x 1 ,x 2 , ■ . . ,x k+ i) = x k+1 + (1 - x k+ i)a k (xi,x 2 , ■ ■ ■ ,x k ). (24) 

Then 

(a) a k is symmetric in its k arguments, so it is well-defined as a mapping 

a : {x t : 1 < i < k} i-> R 

from finite real sets to the reals 

(b) a k takes [0, l] fc to [0,1] and is monotonically increasing in each argument on [0, l] fe 

(c) If B C C C [0, 1] are finite sets then a(B) < a(C) 

(d) a k (x, x, . . . , x) = 1 - (1 - x) k 

(e) if B is finite and 1 G B C [0, 1] then a(B) = 1. 

(f ) if B C [0, 1] is a finite set then a(B) < ~^2 xeB x. 

Remark 2.9. In light of (a), we will use the notation a k (x\,X2, ■ ■ • , x k ) and a({xi : 1 < i < k}) 
interchangeably, as dictated by convenience. 

Proof. Claims (a), (b), (e), (f) are straightforward to verify from the recursive definition of a 
and induction. Claim (c) follows from (b) since 



a k +i(x 1 ,x 2 , ...,x k ,0) = a k (xi,x 2 , ■ ■ ■ ,x k ) 
and (d) is easily derived from the binomial expansion of (1 — x) k . 



□ 



The function a k is the natural generalization of a 2 (x\^X2) = x\ + x 2 — x\x 2 to k variables, 
and it is what we need for the analogue of Lemma 12.61 for a product of k tensors: 

Corollary 2.10. Let {u^}i e i and {vW}ig/ be two sets of tensors and assume that each of 
u'^V™ is a probability measure on S. Then we have 



lu 



iei iei 
Proof. Pick an io G / and let p — u^ ', 

P'= 



< a{||uW -v^|| :iel\ 



(25) 



q = v (io) j 

u« q'= 6d vW. 



io^iEl initial 

Apply Lemma \2. 61 to ||p <B> q — p' <8> q'|| and proceed by induction. 



□ 
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Our final generalization concerns linear operators over /-tensors. An /, J-matrix A has 
dimensions |6> J | x and takes an /-tensor u to a /-tensor v: for each yj 6 S J , we have 

v[j/j] = Myj^i]u[xi], (26) 

which we write as Au = v. If A is an /, /-matrix and B is a J, /^-matrix, the matrix product 
BA is defined analogously to 

As a special case, an /, J-matrix might factorize as a tensor product of |«S| x |«S| matrices 
A^' J ' € IR 5 * 5 . We will write such a factorization in terms of a bipartite graph 4 G = (I + J,E), 
where E C / x J and the factors A^ 1 '^ are indexed by £ E: 

A= (g) (27) 

where 

A[y J ,x I }= \[ A%% 

for all xi S S 1 and yj G 5 J . The norm of an /, J-matrix is a natural generalization of the 
matrix norm defined in (|19J) : 

||A|| = max \\A[-, Xl ]- A[-,x'j]\\ (28) 

where u = A[-,xj] is the J-tensor given by 

u[yj] = A[yj,xi]; 

(|28|l is well-defined via the tensor norm in ll2.'iH . Since /, J matrices act on /-tensors by ordinary 
matrix multiplication, ||Au|| < ||A|| ||u|| continues to hold when A is a column-stochastic /, J- 
matrix and u is a balanced /-tensor; if, additionally, B is a column-stochastic J X-matrix, 
||BA|| < ||B|| ||A|| also holds. Likewise, since another way of writing l|27|l is 

A[; XI }= (g) A^[; Xi ], 

Corollary 12 . 1 01 extends to tensor products of matrices: 

Lemma 2.11. Fix index sets I, J and a bipartite graph (I + J,E). Let \ A^'-*' \ be a 

collection of column- stochastic \S\ x \S\ matrices, whose tensor product is the /, J matrix 

A= (g) A< w ». 

Then 

\\A\\ < a{\\A^\\:(i,j)€E}. 

We are now in a position to state the main technical lemma, from which Theorem 12 . II will 
follow straightforwardly: 

4 Our notation for bipartite graphs is standard; it is equivalent to G = (I LI J, E) where I and J are always assumed 
to be disjoint. 
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Lemma 2.12. Let S be a finite set and let (Xi)i<j< n , Xi € S be a Markov tree process, defined 
by a tree T = (V, E) and transition kernels po, {p U v(' | ")}(« v )ee- Let the (u,v)- contraction 
coefficient 6 UV be as defined in hll)) . 

Fix 1 < i < j < n and let jo = jo{i,j) be as defined in Lemma \2.<ft (we are assuming its 
existence, for otherwise fjij = 0). Then we have 



dcp T (jo) 

ijij < Yl ct{6 uv : v € lev T (d)} 



(29) 



d— dcp T (i) + l 

where dep T (-) is defined in H2.2.1\ 

Proof. For y S and w, w' S S, we have 



r)ij(y,w,w') 



\ M x ? = x 7 1 x i = v w ) - p i x ? = x i \ x i=v w '}\ 



(30) 



£(p{xf +1 = 2^|Xj=^} 

{x- +1 = [zl~lx-]\X{=yw'}) 



j—i 



Let Ti be the subtree induced by i and 

Z = T, n {i + 1, . . . , jo - 1} and C = {v € Tj : (u, v) £ E, u < jo, v > j }. 
Then by Lemma 12.31 and the Markov property, we get 



(31) 



(32) 



x[C] 



( V{X[C UZ] = x[C U Z] I Xi =w}- ¥{X[C U Z\ = x[C UZ]\Xi = w'} 

x[Z] 



(33) 



(the sum indexed by {jo, . . . ,n}\C marginalizes out). 

Dchnc D = {dk : k — 0, . . . , \D\} with do = dep T («), d|£>| = dep T (jo) and dk+i = dk + 1 for 
< k < \D\. For d e D, let I d = T l C\ lev T (d) and G d = {h-i + Id, E d ) be the bipartite graph 
consisting of the nodes in Id-i and Id, and the edges in E joining them (note that Id = {*})■ 

For (u,v) £ E, let A^ u ' v ^ be the \S\ x \S\ matrix given by 



Aw 5 =Pu«(*|!» / ) 



and note that 



S 1 "- 1 , de D\ {d }, we have 

¥{X Ii =z Id \X Id _ 1 =x Ii _ 1 ) 
where 



9 U v Then by the Markov property, for each z[Id] € S Id and G 



A (d) = (g) A (u '^. 

(ii,«)e-E d 
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Likewise, for d € D \ {do}, 

F{X u =x Jd \Xi = w} = EE'" E 



x i 1 x i 2 x i d _^ 



F{X h = x' h \ X l =w) P{X l2 = x% | X Xl = x' h } ■ ■ ■ 
¥{x Id =x Id \X Id _ 1 =x { I d - 1 i ) } 
= (A^A^" 1 ' ■■■A^ dl) )[xi d ,w}. (34) 

Define the (balanced) 1^ -tensor 

h = A (dl) [-,w] - A (dl) [-,w'], (35) 

the Id lD | -tensor 

f = A (di D A (diDi-i) . . . A (^) h; ( 36 ) 

and C , Ci, Z C {1, . . . ,n}: 

C = C H Idep x (jo)) C\ = C \ C , ^o = ^dep T {j'o) \ ^0' V 

where C and Z are defined in l|32[) . For readability we will write V(xu \ •) instead of F{Xjj = xjj | •} 
below; no ambiguity should arise. Combining l|33|l and (|34|l . we have 



= iEE 

= llBfll 



^P(x[Ci]| a; [Zo])f[CoUZo 



(38) 

(39) 
(40) 



where B is the I^CoUCi | x |£C u.Zo| co l umn -stochastic matrix given by 



B[x Co Ux Cl ,x' c Ux Zo ] = lr , lPfe kz ) 

with the convention that P(xci I ^Zo) = 1 if either of {Zo,Ci} is empty. The claim now follows 
by reading off the results previously obtained: 



|Bf|| < ||B||||f|! 
< llfll 



< iihiim=' 2 

< nl=U{|| A( " ,,;) ll : K«)e^J 



Eq. 
Remark 12.51 

Eqs. mm 

Lemma 12.111 



□ 



Proof of Theorem \2.1\ We will borrow the definitions from the proof of Lemma ['2. 121 To upper- 
bound fjij we first bound a{\\ A*-"'"- 1 1| : (u,v) S Ed k }. Since 

\E dk \ <wid(T) <L 



11 



(because every node in Id k has exactly one parent in J ( j fc _ 1 ) and 



<9<1 1 



we appeal to Lemma 12.81 to obtain 

a{||A^|| :(u,v)eE dh } < I - (1 - 9) L . 



(41) 



Now we must lower-bound the quantity h = dep T (jo) — dep T (i). Since every level can have up 
to L nodes, we have 

jo-i< hL 



and so h > [{jo ~ i)/L\ > [(j - i)/L\. 



□ 



The calculations in Lemma 12.121 yield considerably more information than the simple bound 
in 112(1. For example, suppose the tree T has levels {Id : d = 0, 1, . . .} with the property that 
the levels are growing at most linearly: 

\h\ < cd 

for some c > 0. Let di = dep T (i), dj = dep T (j ), and h = dj — di. Then 

j -i<jo-i < c ^2 k 

di+l 

= ^(djidj + ^-di {di + l)) 

< |((^ + l) 2 -rf?) 

< c -{d l + h + l) 2 
so 

h>y/2(j-i)/c-di-l, 

which yields the bound, via Lemma fe.Sf f). 



Vij < Yl X! ® uv ' 

k=l (u,v)GE k 

Let 9k = max{9 uv : (u, v) £ then if ck9k < (3 holds for some j3 £ R, this becomes 



(42) 



fjij < Y[{ck9 k ) 

k=l 



y/2(j-i)/c-di-l 

n ( ck °k) 
fe=i 



(43) 



This is a non-trivial bound for trees with linearly growing levels: recall that to bound ||A|| JSJ), 
we must bound the series 

oo 

j=i+l 
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By the limit comparison test with the series Y^jLi Vi 2 ; we have that 



p\/ 2 U-i)/c-di-l 

j=i+l 

converges for /3 < 1. Similar techniques may be applied when the level growth is bounded by 
other slowly increasing functions. 



3 Discussion 

We have presented a concentration of measure bound for Markov tree processes; to our knowl- 
edge, this is the first such result. 5 In the simple case of the contracting, bounded-width Markov 
tree processes (i.e., those for which wid(T) < L < oo and sup u UV < 6 < 1), the bound takes 
on a particularly tractable form |Q , and in the degenerate case L = 1 it reduces to the sharpest 
known bound for Markov chains. The techniques we develop extend well beyond the somewhat 
restrictive contracting-bounded- width demonstrated in the calculation in (|43|) . 

The technical results in H2.4I particularly Lemma 12.61 and its generalizations, might be of 
independent interest. It is hoped that these techniques will be extended to obtain concentration 
bounds for larger classes of directed acyclic graphical models. 
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